{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fc37c39a-7406-4c13-a754-b8e95fd970a0",
   "metadata": {},
   "source": [
    "# How to stream responses from an LLM\n",
    "\n",
    "All `LLM`s implement the [Runnable interface](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable), which comes with **default** implementations of standard runnable methods (i.e. `ainvoke`, `batch`, `abatch`, `stream`, `astream`, `astream_events`).\n",
    "\n",
    "The **default** streaming implementations provide an`Iterator` (or `AsyncIterator` for asynchronous streaming) that yields a single value: the final output from the underlying chat model provider.\n",
    "\n",
    "The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.\n",
    "\n",
    "See which [integrations support token-by-token streaming here](/docs/integrations/llms/).\n",
    "\n",
    "\n",
    "\n",
    ":::{.callout-note}\n",
    "\n",
    "The **default** implementation does **not** provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.\n",
    "\n",
    ":::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f13124a-7f9d-404f-b7ac-70d8ea49ef8e",
   "metadata": {},
   "source": [
    "## Sync stream\n",
    "\n",
    "Below we use a `|` to help visualize the delimiter between tokens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9baa0527-b97d-41d3-babd-472ec5e59e3e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "|Spark|ling| water|,| oh| so clear|\n",
      "|Bubbles dancing|,| without| fear|\n",
      "|Refreshing| taste|,| a| pure| delight|\n",
      "|Spark|ling| water|,| my| thirst|'s| delight||"
     ]
    }
   ],
   "source": [
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
    "for chunk in llm.stream(\"Write me a 1 verse song about sparkling water.\"):\n",
    "    print(chunk, end=\"|\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "596e477b-a41d-4ff5-9b9a-a7bfb53c3680",
   "metadata": {},
   "source": [
    "## Async streaming\n",
    "\n",
    "Let's see how to stream in an async setting using `astream`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d81140f2-384b-4470-bf93-957013c6620b",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "|Spark|ling| water|,| oh| so clear|\n",
      "|Bubbles dancing|,| without| fear|\n",
      "|Refreshing| taste|,| a| pure| delight|\n",
      "|Spark|ling| water|,| my| thirst|'s| delight||"
     ]
    }
   ],
   "source": [
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
    "async for chunk in llm.astream(\"Write me a 1 verse song about sparkling water.\"):\n",
    "    print(chunk, end=\"|\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ab11306-b0db-4459-a9de-ecefb821c9b1",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Async event streaming\n",
    "\n",
    "\n",
    "LLMs also support the standard [astream events](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.astream_events) method.\n",
    "\n",
    ":::{.callout-tip}\n",
    "\n",
    "`astream_events` is most useful when implementing streaming in a larger LLM application that contains multiple steps (e.g., an application that involves an `agent`).\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "399d74c7-4438-4093-ae05-47fed0255626",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model=\"gpt-3.5-turbo-instruct\", temperature=0, max_tokens=512)\n",
    "\n",
    "idx = 0\n",
    "\n",
    "async for event in llm.astream_events(\n",
    "    \"Write me a 1 verse song about goldfish on the moon\", version=\"v1\"\n",
    "):\n",
    "    idx += 1\n",
    "    if idx >= 5:  # Truncate the output\n",
    "        print(\"...Truncated\")\n",
    "        break\n",
    "    print(event)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
